Reproducible research

An introduction for the R novice


Welcome!



Richard Layton

Department of Mechanical Engineering
Rose-Hulman Institute of Technology
Fall 2016

Welcome


Trying to practice what I preach, course materials are reproducible.


https://dsr-rhit.github.io/me497-reproducible-research/

Getting started


Introductions
Handouts


Write down your ideas in response to Mystery question 1:


What is reproducible research?

Practitioners tell us:


Research is reproducible when the data and the code used to obtain a finding are available and sufficient for an independent researcher to recreate the finding.


  • computational, data-intensive

  • spans the full data, analysis, & publication workflow

  • most of us have received only perfunctory training (if any)


Victoria Stodden, F. Leisch, & R. Peng, ed., Implementing Reproducible Research, CRC Press, 2014.
Christopher Gandrud, Reproducible Research with R and RStudio, 2/e, CRC Press, 2015.

Events tell us:


More accountability is needed because of

  • data falsification
  • erroneous analysis
  • misleading presentation of results


Karen EC Levy & David Merritt Johns, When open data is a Trojan Horse: The weaponization of transparency in science and governance, Big Data and Society, 2016.

Attempts to reproduce this work revealed . . .

the primary findings were false. The major effect disappeared after correcting for

  • coding errors

  • selective exclusion of available data

  • unconventional weighting of summary statistics


Kenneth Rogoff & Carmen Reinhart


Thomas Herdon, Michael Ash, & Robert Pollin, Does high public debt consistently stifle economic growth? A critique of Reinhart and Rogoff, Political Economy Research Institute, U Mass Amherst, 2013.

Attempts to reproduce this work revealed . . .

data were falsified to obtain the research outcomes he wanted, resulting in

  • retracted journal articles (11 to date)

  • terminated clinical trials

  • cancelled research funding

  • civil suit by patients


Anil Potti


Jason deBruyn, Trial involving disgraced scientist and bunk Duke research to begin Monday., Triangle Business Journal, 2015-01-23.
Ivan Oransky, It’s official: Anil Potti faked cancer research data, say Feds, Retraction Watch, 2015-11-07.

However, open science has also been “weaponized”

Scientists and skeptics are in a knife fight, and you don’t bring data to a knife fight.
— Paul Erlich

Why should I make the data available to you, when your aim is to try and find something wrong with it?
— Phil Jones


1000 years of temperature variation: the ”hockey stick” graph by Michael Mann


Freed Pearce, Climate change debate overheated after sceptic grasped ‘hockey stick’, The Guardian, 2010-02-09.
Brad Keyes, Mann retirement: Analysis, reax, Climate Sceptic, 2016-05-08.
Jeff Leek, De-weaponizing reproducibility, 2015-03-13.

The primary benficiary is you

If you do anything “by hand”" once, you’ll do it 100 times.

— Paul Wilson, UW–Madison

Your closest collaborator is you, six months ago. Have you tried to email that slacker?

— Karl Broman, UW–Madison

To preserve sanity, stop collaborating via email, attachments, and tracking changes in Word.

— Jenny Bryan, UBC

Steps to take towards reproducibility

  • Write scripts (avoid manual copy, paste, mouse-clicks)

  • Plan the organization and naming scheme for files

  • Strive for simplicity, readability, reusability, and testability

  • Agree on a workflow for collaborating before starting a manuscript

  • DRY (don’t repeat yourself)

  • Link files explicitly

  • Plan data management

  • Use version control

  • Postpone optimization

  • License your software


Karl Broman, Initial steps toward reproducible research.
Jenny Bryan, Karen Cranston, Justin Kitzes, Lex Nederbragt, Tracy Teal, and Greg Wilson, Good enough practices for scientific computing, 2016-01.

Steps to take towards reproducibility in this course

  • Write scripts (avoid manual copy, paste, mouse-clicks)

  • Plan the organization and naming scheme for files

  • Strive for simplicity, readability, reusability, and testability

  • Agree on a workflow for collaborating before starting a manuscript

  • DRY (don’t repeat yourself)

  • Link files explicitly

  • Plan data management

  • Use version control

  • Postpone optimization

  • License your software

Learning objectives


See the syllabus.


Start your week 0 assignments.

Consider a sample report


Imagine that you were the author of the “Load cell calibration report”


Carefully review the report and answer Mystery question 2:


Identify as many “manual operations”
as possible.

Homework